AITopics | refinement process

Collaborating Authors

refinement process

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Leveraging Environment Interaction for Automated PDDL Translation and Planning with Large Language Models

Neural Information Processing SystemsMar-20-2026, 01:21:33 GMT

Large Language Models (LLMs) have shown remarkable performance in various natural language tasks, but they often struggle with planning problems that require structured reasoning. To address this limitation, the conversion of planning problems into the Planning Domain Definition Language (PDDL) has been proposed as a potential solution, enabling the use of automated planners. However, generating accurate PDDL files typically demands human inputs or correction, which can be time-consuming and costly. In this paper, we propose a novel approach that leverages LLMs and environment feedback to automatically generate PDDL domain and problem description files without the need for human intervention. Our method introduces an iterative refinement process that generates multiple problem PDDL candidates and progressively refines the domain PDDL based on feedback obtained from interacting with the environment. To guide the refinement process, we develop an Exploration Walk (EW) metric, which provides rich feedback signals for LLMs to update the PDDL file. We evaluate our approach on $10$ PDDL environments. We achieve an average task solve rate of 66\% compared to a 29\% solve rate by GPT-4's intrinsic planning with chain-of-thought prompting. Our work enables the automated modeling of planning environments using LLMs and environment feedback, eliminating the need for human intervention in the PDDL translation process and paving the way for more reliable LLM agents in challenging problems.

artificial intelligence, natural language, proceedings, (9 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.59)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback A Additional Results

Neural Information Processing SystemsOct-9-2025, 09:39:10 GMT

Each image is a given rating between 1 and 5 (where 1 represents that'image is semantically

input prompt, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

DisCo-Layout: Disentangling and Coordinating Semantic and Physical Refinement in a Multi-Agent Framework for 3D Indoor Layout Synthesis

Gao, Jialin, Zhou, Donghao, Liang, Mingjian, Liu, Lihao, Fu, Chi-Wing, Hu, Xiaowei, Heng, Pheng-Ann

arXiv.org Artificial IntelligenceOct-3-2025

3D indoor layout synthesis is crucial for creating virtual environments. Traditional methods struggle with generalization due to fixed datasets. While recent LLM and VLM-based approaches offer improved semantic richness, they often lack robust and flexible refinement, resulting in suboptimal layouts. We develop DisCo-Layout, a novel framework that disentangles and coordinates physical and semantic refinement. For independent refinement, our Semantic Refinement Tool (SRT) corrects abstract object relationships, while the Physical Refinement Tool (PRT) resolves concrete spatial issues via a grid-matching algorithm. For collaborative refinement, a multi-agent framework intelligently orchestrates these tools, featuring a planner for placement rules, a designer for initial layouts, and an evaluator for assessment. Experiments demonstrate DisCo-Layout's state-of-the-art performance, generating realistic, coherent, and generalizable 3D indoor layouts. Our code will be publicly available.

artificial intelligence, layout, refinement, (14 more...)

arXiv.org Artificial Intelligence

2510.02178

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Small sample-based adaptive text classification through iterative and contrastive description refinement

Rajeev, Amrit, Avadhanam, Udayaadithya, Tulapurkar, Harshula, Sundar, SaiBarath

arXiv.org Artificial IntelligenceAug-5-2025

Zero-shot text classification remains a difficult task in domains with evolving knowledge and ambiguous category boundaries, such as ticketing systems. Large language models (LLMs) often struggle to generalize in these scenarios due to limited topic separability, while few-shot methods are constrained by insufficient data diversity. We propose a classification framework that combines iterative topic refinement, contrastive prompting, and active learning. Starting with a small set of labeled samples, the model generates initial topic labels. Misclassified or ambiguous samples are then used in an iterative contrastive prompting process to refine category distinctions by explicitly teaching the model to differentiate between closely related classes. The framework features a human-in-the-loop component, allowing users to introduce or revise category definitions in natural language. This enables seamless integration of new, unseen categories without retraining, making the system well-suited for real-world, dynamic environments. The evaluations on AGNews and DBpedia demonstrate strong performance: 91% accuracy on AGNews (3 seen, 1 unseen class) and 84% on DBpedia (8 seen, 1 unseen), with minimal accuracy shift after introducing unseen classes (82% and 87%, respectively). The results highlight the effectiveness of prompt-based semantic reasoning for fine-grained classification with limited supervision.

category, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.00957

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Generalized Derangetropy Functionals for Modeling Cyclical Information Flow

Ataei, Masoud, Wang, Xiaogang

arXiv.org Artificial IntelligenceJun-17-2025

This paper introduces a framework for modeling cyclical and feedback-driven information flow through a generalized family of entropy-modulated transformations called derangetropy functionals. Unlike scalar and static entropy measures such as Shannon entropy, these functionals act directly on probability densities and provide a topographical representation of information structure across the support of the distribution. The framework captures periodic and self-referential aspects of information distribution and encodes them through functional operators governed by nonlinear differential equations. When applied recursively, these operators induce a spectral diffusion process governed by the heat equation, leading to convergence toward a Gaussian characteristic function. This convergence theorem provides a unified analytical foundation for describing the long-term dynamics of information under cyclic modulation. The proposed framework offers new tools for analyzing the temporal evolution of information in systems characterized by periodic structure, stochastic feedback, and delayed interaction, with applications in artificial neural networks, communication theory, and non-equilibrium statistical mechanics.

artificial intelligence, machine learning, transformation, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/e27060608

2504.14605

Country: North America > Canada > Ontario (0.28)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)

Add feedback

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction

Wang, Yuchi, Cai, Yishuo, Ren, Shuhuai, Yang, Sihan, Yao, Linli, Liu, Yuanxin, Zhang, Yuanxing, Wan, Pengfei, Sun, Xu

arXiv.org Artificial IntelligenceMay-29-2025

Image recaptioning is widely used to generate training datasets with enhanced quality for various multimodal tasks. Existing recaptioning methods typically rely on powerful multimodal large language models (MLLMs) to enhance textual descriptions, but often suffer from inaccuracies due to hallucinations and incompleteness caused by missing fine-grained details. To address these limitations, we propose RICO, a novel framework that refines captions through visual reconstruction. Specifically, we leverage a text-to-image model to reconstruct a caption into a reference image, and prompt an MLLM to identify discrepancies between the original and reconstructed images to refine the caption. This process is performed iteratively, further progressively promoting the generation of more faithful and comprehensive descriptions. To mitigate the additional computational cost induced by the iterative process, we introduce RICO-Flash, which learns to generate captions like RICO using DPO. Extensive experiments demonstrate that our approach significantly improves caption accuracy and completeness, outperforms most baselines by approximately 10% on both CapsBench and CompreCap. Code released at https://github.com/wangyuchi369/RICO.

caption, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2505.22613

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Leveraging Environment Interaction for Automated PDDL Translation and Planning with Large Language Models

Neural Information Processing SystemsMay-26-2025, 23:14:33 GMT

artificial intelligence, natural language, planning & scheduling, (8 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Gaze-based Task Decomposition for Robot Manipulation in Imitation Learning

Takizawa, Ryo, Ohmura, Yoshiyuki, Kuniyoshi, Yasuo

arXiv.org Artificial IntelligenceFeb-1-2025

In imitation learning for robotic manipulation, decomposing object manipulation tasks into multiple sub-tasks is essential. This decomposition enables the reuse of learned skills in varying contexts and the combination of acquired skills to perform novel tasks, rather than merely replicating demonstrated motions. Gaze plays a critical role in human object manipulation, where it is strongly correlated with hand movements. We hypothesize that an imitating agent's gaze control, fixating on specific landmarks and transitioning between them, simultaneously segments demonstrated manipulations into sub-tasks. In this study, we propose a simple yet robust task decomposition method based on gaze transitions. The method leverages teleoperation, a common modality in robotic manipulation for collecting demonstrations, in which a human operator's gaze is measured and used for task decomposition as a substitute for an imitating agent's gaze. Notably, our method achieves consistent task decomposition across all demonstrations for each task, which is desirable in contexts such as machine learning. We applied this method to demonstrations of various tasks and evaluated the characteristics and consistency of the resulting sub-tasks. Furthermore, through extensive testing across a wide range of hyperparameter variations, we demonstrated that the proposed method possesses the robustness necessary for application to different robotic systems.

artificial intelligence, decomposition, demonstration, (15 more...)

arXiv.org Artificial Intelligence

2501.15071

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (0.84)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (0.51)

Add feedback

On Synthetic Texture Datasets: Challenges, Creation, and Curation

Hoak, Blaine, McDaniel, Patrick

arXiv.org Artificial IntelligenceSep-16-2024

The influence of textures on machine learning models has been an ongoing investigation, specifically in texture bias/learning, interpretability, and robustness. However, due to the lack of large and diverse texture data available, the findings in these works have been limited, as more comprehensive evaluations have not been feasible. Image generative models are able to provide data creation at scale, but utilizing these models for texture synthesis has been unexplored and poses additional challenges both in creating accurate texture images and validating those images. In this work, we introduce an extensible methodology and corresponding new dataset for generating high-quality, diverse texture images capable of supporting a broad set of texture-based tasks. Our pipeline consists of: (1) developing prompts from a range of descriptors to serve as input to text-to-image models, (2) adopting and adapting Stable Diffusion pipelines to generate and filter the corresponding images, and (3) further filtering down to the highest quality images. Through this, we create the Prompted Textures Dataset (PTD), a dataset of 362,880 texture images that span 56 textures. During the process of generating images, we find that NSFW safety filters in image generation pipelines are highly sensitive to texture (and flag up to 60\% of our texture images), uncovering a potential bias in these models and presenting unique challenges when working with texture data. Through both standard metrics and a human evaluation, we find that our dataset is high quality and diverse.

dataset, texture, texture image, (14 more...)

arXiv.org Artificial Intelligence

2409.10297

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Filters

Collaborating Authors

refinement process

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Leveraging Environment Interaction for Automated PDDL Translation and Planning with Large Language Models

dfd0bd56e8a6f82d1619f5d093d5f9ca-Supplemental-Conference.pdf

Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback A Additional Results

DisCo-Layout: Disentangling and Coordinating Semantic and Physical Refinement in a Multi-Agent Framework for 3D Indoor Layout Synthesis

Small sample-based adaptive text classification through iterative and contrastive description refinement

Generalized Derangetropy Functionals for Modeling Cyclical Information Flow

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction

Leveraging Environment Interaction for Automated PDDL Translation and Planning with Large Language Models

Gaze-based Task Decomposition for Robot Manipulation in Imitation Learning

On Synthetic Texture Datasets: Challenges, Creation, and Curation